Human diseases and symptoms

A bipartite network of human diseases and their associated symptoms, as extracted from PubMed's biomedical literature in c.2014 using TF-IDF weighted co-occurrences derived from the MeSH metadata field. Edges are weighted by the TF-IDF score and PubMed occurrence count, and nodes are labeled with the associated MeSH disease or symptom term

Paper: Human symptoms–disease network

Libraries

In [1]:
# Installing packages
In [2]:
!pip install python_louvain
Requirement already satisfied: python_louvain in /home/dennishnf/miniconda3/envs/master-disc/lib/python3.6/site-packages (0.15)
Requirement already satisfied: numpy in /home/dennishnf/.local/lib/python3.6/site-packages (from python_louvain) (1.19.5)
Requirement already satisfied: networkx in /home/dennishnf/miniconda3/envs/master-disc/lib/python3.6/site-packages (from python_louvain) (2.5)
Requirement already satisfied: decorator>=4.3.0 in /home/dennishnf/miniconda3/envs/master-disc/lib/python3.6/site-packages (from networkx->python_louvain) (5.1.0)
In [3]:
# Importing libraries
In [4]:
import pandas as pd
import numpy as np
import networkx as nx
import collections
import statistics as stats
import time
from matplotlib import pyplot as plt
import seaborn as sns
import json
import matplotlib.pyplot as plt
import pandas as pd
from networkx.algorithms import bipartite
from community import community_louvain # for nxv2
In [5]:
# disable auto-scrolling
In [6]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

Exploratory Data and Pre-Processing

Supplementary Data 1

List of all 4,442 diseases within PubMed and their occurrence. (TXT 113 kb)

Supplementary Data 2

List of all 322 symptoms within PubMed and their occurrence. (TXT 6 kb)

Supplementary Data 3

Term co-occurrences between symptoms and diseases measured by TF-IDF weighted values. This table includes 147,978 records of symptom and disease relationships. (TXT 7797 kb)

Supplementary Data 4

List of disease links in the disease network with both significant shared symptoms and shared genes/PPIs. In total there are 133,106 such connections between 1,596 distinct diseases. (TXT 7801 kb)

In [ ]:
 
In [7]:
data1 = pd.read_csv("data/41467_2014_BFncomms5212_MOESM1043_ESM.txt", delimiter = "\t")
data2 = pd.read_csv("data/41467_2014_BFncomms5212_MOESM1044_ESM.txt", delimiter = "\t")
data3 = pd.read_csv("data/41467_2014_BFncomms5212_MOESM1045_ESM.txt", delimiter = "\t")
data4 = pd.read_csv("data/41467_2014_BFncomms5212_MOESM1046_ESM.txt", delimiter = "\t")
In [8]:
data1
Out[8]:
MeSH Disease Term PubMed occurrence
0 Breast Neoplasms 122226
1 Hypertension 107294
2 Coronary Artery Disease 82819
3 Lung Neoplasms 78009
4 Myocardial Infarction 75945
... ... ...
4437 Mannosidase Deficiency Diseases 1
4438 White Heifer Disease 1
4439 Tetrasomy 1
4440 Milk Sickness 1
4441 Intrauterine Device Migration 1

4442 rows × 2 columns

In [9]:
data2
Out[9]:
MeSH Symptom Term PubMed occurrence
0 Body Weight 147857
1 Pain 103168
2 Obesity 100301
3 Anoxia 47351
4 Mental Retardation 43883
... ... ...
317 Alien Hand Syndrome 10
318 Necrolytic Migratory Erythema 7
319 Body Weight Changes 4
320 Slit Ventricle Syndrome 3
321 Infantile Apparent Life-Threatening Event 2

322 rows × 2 columns

In [10]:
data3
Out[10]:
MeSH Symptom Term MeSH Disease Term PubMed occurrence TFIDF score
0 Aging, Premature Respiratory Syncytial Virus Infections 1 3.464551
1 Aging, Premature Orthomyxoviridae Infections 1 3.464551
2 Aging, Premature HIV Infections 3 10.393654
3 Aging, Premature Acquired Immunodeficiency Syndrome 3 10.393654
4 Aging, Premature Breast Neoplasms 1 3.464551
... ... ... ... ...
147973 Hirsutism Tobacco Use Disorder 1 2.483722
147974 Hirsutism Radius Fractures 1 2.483722
147975 Hirsutism Burns 1 2.483722
147976 Hirsutism Colles' Fracture 1 2.483722
147977 Hirsutism Radiation Injuries 1 2.483722

147978 rows × 4 columns

In [11]:
data4
Out[11]:
MeSH Disease Term MeSH Disease Term.1 symptom similarity score
0 Histiocytoma, Benign Fibrous Aneurysm 0.591937
1 Histiocytoma, Benign Fibrous Carcinoma, Basal Cell 0.310479
2 Arthropathy, Neurogenic Corneal Dystrophies, Hereditary 0.133123
3 Arthropathy, Neurogenic Foot Deformities, Congenital 0.156900
4 Hemangioendothelioma, Epithelioid Thyroid Neoplasms 0.157077
... ... ... ...
133101 Myopia Hypotrichosis 0.140889
133102 Myopia Vitamin A Deficiency 0.199410
133103 IgA Deficiency Intestinal Polyps 0.142195
133104 IgA Deficiency Autoimmune Lymphoproliferative Syndrome 0.561311
133105 Torticollis Cryptorchidism 0.104462

133106 rows × 3 columns

In [12]:
data1.plot(kind='scatter',x='MeSH Disease Term',y='PubMed occurrence',color='red')
plt.show()
In [13]:
data2.plot(kind='scatter',x='MeSH Symptom Term',y='PubMed occurrence',color='red')
plt.show()
In [ ]:
 
In [ ]:
 
In [14]:
x = list(data3["PubMed occurrence"])[0:1000]
In [15]:
hist, bin_edges = np.histogram(x, bins=[i for i in range(0,100)])
In [ ]:
 
In [16]:
plt.plot(bin_edges[0:len(hist)], hist,'o');
plt.xlabel("Symptom and disease co-occurrence")
plt.ylabel("Number or records")
plt.title("Symptom and disease co-occurrence distribution")
Out[16]:
Text(0.5, 1.0, 'Symptom and disease co-occurrence distribution')
In [17]:
plt.loglog(bin_edges[0:len(hist)], hist,'o');
plt.xlabel("Symptom and disease co-occurrence")
plt.ylabel("Number or records")
plt.title("Symptom and disease co-occurrence distribution")
Out[17]:
Text(0.5, 1.0, 'Symptom and disease co-occurrence distribution')
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [18]:
data_sd = data3[data3["PubMed occurrence"]>=100]
In [19]:
data_sd = data_sd[data_sd["MeSH Symptom Term"]!=data_sd["MeSH Disease Term"]]
In [20]:
data_sd
Out[20]:
MeSH Symptom Term MeSH Disease Term PubMed occurrence TFIDF score
668 Fever Bacterial Infections 651 402.928399
669 Fever Bacteremia 279 172.683599
682 Fever Endocarditis, Bacterial 128 79.224017
800 Fever Tuberculosis, Pulmonary 123 76.129329
816 Fever Staphylococcal Infections 151 93.459582
... ... ... ... ...
147641 Hirsutism Polycystic Ovary Syndrome 471 1169.833173
147702 Hirsutism Ovarian Neoplasms 109 270.725724
147817 Hirsutism Hyperandrogenism 133 330.335057
147883 Hirsutism Acne Vulgaris 118 293.079224
147895 Hirsutism Hypertrichosis 1753 4353.965081

1708 rows × 4 columns

In [ ]:
 

Bipartite Graph

In [21]:
edges = data_sd.reindex(columns=["MeSH Symptom Term","MeSH Disease Term"])
edges
Out[21]:
MeSH Symptom Term MeSH Disease Term
668 Fever Bacterial Infections
669 Fever Bacteremia
682 Fever Endocarditis, Bacterial
800 Fever Tuberculosis, Pulmonary
816 Fever Staphylococcal Infections
... ... ...
147641 Hirsutism Polycystic Ovary Syndrome
147702 Hirsutism Ovarian Neoplasms
147817 Hirsutism Hyperandrogenism
147883 Hirsutism Acne Vulgaris
147895 Hirsutism Hypertrichosis

1708 rows × 2 columns

In [ ]:
 
In [ ]:
 
In [22]:
part0 = edges['MeSH Symptom Term'].unique()
len(part0)
Out[22]:
202
In [23]:
part1 = edges['MeSH Disease Term'].unique()
len(part1)
Out[23]:
671
In [ ]:
 
In [24]:
def intersection(lst1, lst2):
    lst3 = [value for value in lst1 if value in lst2]
    return lst3
In [25]:
# Find common symptoms - disseases, not useful 
comm = intersection(part0, part1)
len(comm)
Out[25]:
102
In [26]:
# Remove the rows that are in comm
edges = edges[~edges['MeSH Symptom Term'].isin(comm)]
In [ ]:
 
In [27]:
# Delete duplicate rows
edges_ = edges.drop_duplicates(subset=["MeSH Symptom Term","MeSH Disease Term"], keep=False)
In [28]:
edges_
Out[28]:
MeSH Symptom Term MeSH Disease Term
668 Fever Bacterial Infections
669 Fever Bacteremia
682 Fever Endocarditis, Bacterial
800 Fever Tuberculosis, Pulmonary
816 Fever Staphylococcal Infections
... ... ...
147641 Hirsutism Polycystic Ovary Syndrome
147702 Hirsutism Ovarian Neoplasms
147817 Hirsutism Hyperandrogenism
147883 Hirsutism Acne Vulgaris
147895 Hirsutism Hypertrichosis

692 rows × 2 columns

In [ ]:
 
In [29]:
part0 = edges_['MeSH Symptom Term'].unique()
#part0
In [30]:
part1 = edges_['MeSH Disease Term'].unique()
#part1
In [ ]:
 
In [31]:
## Symptom // Disease
joins = list(edges_.to_records(index=False))
#joins
In [ ]:
 
In [32]:
BI = nx.Graph()
BI.add_nodes_from(part0, bipartite=0)
BI.add_nodes_from(part1, bipartite=1)
BI.add_edges_from(joins)
In [33]:
print(nx.info(BI))
Name: 
Type: Graph
Number of nodes: 480
Number of edges: 692
Average degree:   2.8833
In [34]:
# Taking the largest connected component
components = sorted(nx.connected_components(BI), key=len, reverse=True)
largest_component = components[0]
BII = BI.subgraph(largest_component)
In [ ]:
 
In [35]:
print(nx.info(BII))
Name: 
Type: Graph
Number of nodes: 443
Number of edges: 670
Average degree:   3.0248
In [ ]:
 
In [36]:
## Symptoms // Diseases

fig = plt.figure(figsize = (30, 50))
ax = fig.add_subplot(111)
ax.axis('off')

N1, N2 = bipartite.sets(BII)
pos = dict()
pos.update( (n, (1, i)) for i, n in enumerate(N1) ) # put nodes from N1
pos.update( (n, (2, i)) for i, n in enumerate(N2) ) # put nodes from N2
nx.draw(BII, pos=pos, with_labels=True)
plt.show()
In [ ]:
 
In [ ]:
 
In [37]:
Gph_N1 = bipartite.projected_graph(BII, N1, multigraph=False)
Gph_N2 = bipartite.projected_graph(BII, N2, multigraph=False)
print(nx.info(Gph_N1))
print(nx.info(Gph_N2))
Name: 
Type: Graph
Number of nodes: 84
Number of edges: 360
Average degree:   8.5714
Name: 
Type: Graph
Number of nodes: 359
Number of edges: 11409
Average degree:  63.5599
In [ ]:
 
In [ ]:
 
In [ ]:
 

Visualization: Symptoms

In [38]:
t = time.time()
spring_pos = nx.spring_layout(Gph_N1) # might take a little while
elapsed = time.time() - t
print('Time elapsed to get the graph layout: ', elapsed)
fig = plt.figure(figsize = (40, 30))
ax = fig.add_subplot(111)
ax.axis('off')

node_size_default = 40

n = nx.draw_networkx(Gph_N1, 
                     spring_pos,
                     ax = ax,
                     node_size = node_size_default,
                     with_labels = True)
plt.title("Entire graph - Default node size")
plt.close();

fig
Time elapsed to get the graph layout:  0.016798973083496094
Out[38]:

Visualization: Diseases

In [39]:
t = time.time()
spring_pos = nx.spring_layout(Gph_N2) # might take a little while
elapsed = time.time() - t
print('Time elapsed to get the graph layout: ', elapsed)
fig = plt.figure(figsize = (40, 30))
ax = fig.add_subplot(111)
ax.axis('off')

node_size_default = 40

n = nx.draw_networkx(Gph_N2, 
                     spring_pos,
                     ax = ax,
                     node_size = node_size_default,
                     with_labels = True)
plt.title("Entire graph - Default node size")
plt.close();

fig
Time elapsed to get the graph layout:  0.26773881912231445
Out[39]:
In [ ]:
 
In [ ]:
 
In [40]:
# Network metric statistics
def network_metric_statistics(metric_data):
    avg = stats.mean(metric_data)
    med = stats.median(metric_data)
    std = stats.stdev(metric_data)
    
    return("Here is a quick summary of your data: average = " + '{:.5f}'.format(avg) + ", median = " + '{:.5f}'.format(med) + ", standard deviation = " + '{:.5f}'.format(std))
In [ ]:
 

Degrees: Symptoms

In [41]:
#Gph_N1.degree
In [42]:
N1s = [d for d in N1]
N1s_degrees = [Gph_N1.degree[d] for d in N1]
In [43]:
N1s_order = [x for y, x in sorted(zip(N1s_degrees, N1s), reverse=True)]
N1s_degrees_order = sorted((Gph_N1.degree[d] for d in N1), reverse=True)
In [44]:
print("TOP 5 N1: \n",N1s_order[:5]) # from largest to smallest degree value
print("TOP 5 N1 DEGREES: \n",N1s_degrees_order[:5]) # from largest to smallest degree value
TOP 5 N1: 
 ['Body Weight', 'Anoxia', 'Vomiting', 'Edema', 'Nausea']
TOP 5 N1 DEGREES: 
 [45, 39, 33, 33, 30]
In [45]:
network_metric_statistics(N1s_degrees_order)
Out[45]:
'Here is a quick summary of your data: average = 8.57143, median = 5.00000, standard deviation = 8.95812'
In [46]:
degree_count = collections.Counter(N1s_degrees_order)
deg, cnt = zip(*degree_count.items())

plt.figure(figsize=(8,5))

plt.bar(deg, cnt, width=1, color='b')
#plt.loglog(deg, cnt, 'o')
plt.xlabel("Node degree size", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.title("Entire graph - Node degree distribution", fontsize=14)
plt.show()
In [ ]:
 
In [ ]:
 
In [ ]:
 

Degrees: Diseases

In [47]:
#Gph_N2.degree
In [48]:
N2s = [d for d in N2]
N2s_degrees = [Gph_N2.degree[d] for d in N2]
In [49]:
N2s_order = [x for y, x in sorted(zip(N2s_degrees, N2s), reverse=True)]
N2s_degrees_order = sorted((Gph_N2.degree[d] for d in N2), reverse=True)
In [50]:
print("TOP 5 N2: \n",N2s_order[:5]) # from largest to smallest degree value
print("TOP 5 N2 DEGREES: \n",N2s_degrees_order[:5]) # from largest to smallest degree value
TOP 5 N2: 
 ['Postoperative Complications', 'Pregnancy Complications', 'Pain', 'Inflammation', 'Lung Neoplasms']
TOP 5 N2 DEGREES: 
 [243, 209, 203, 199, 197]
In [51]:
network_metric_statistics(N2s_degrees_order)
Out[51]:
'Here is a quick summary of your data: average = 63.55989, median = 46.00000, standard deviation = 53.50827'
In [52]:
degree_count = collections.Counter(N2s_degrees_order)
deg, cnt = zip(*degree_count.items())

plt.figure(figsize=(8,5))

plt.bar(deg, cnt, width=1, color='b')
#plt.plot(deg, cnt, 'o')
#plt.loglog(deg, cnt, 'o')
plt.xlabel("Node degree size", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.title("Entire graph - Node degree distribution", fontsize=14)
plt.show()
In [ ]:
 
In [ ]:
 
In [ ]:
 

Centralities: Symptoms

In [53]:
# degree centrality
deg = nx.degree_centrality(Gph_N1)
sorted(deg.items(), key=lambda item: item[1], reverse=True)[:5]
Out[53]:
[('Body Weight', 0.5421686746987953),
 ('Anoxia', 0.46987951807228917),
 ('Edema', 0.39759036144578314),
 ('Vomiting', 0.39759036144578314),
 ('Nausea', 0.3614457831325301)]
In [54]:
# closeness centrality
closeness = nx.closeness_centrality(Gph_N1)
sorted(closeness.items(), key=lambda item: item[1], reverse=True)[:5]
Out[54]:
[('Body Weight', 0.6384615384615384),
 ('Anoxia', 0.6058394160583942),
 ('Edema', 0.5971223021582733),
 ('Nausea', 0.5763888888888888),
 ('Vomiting', 0.5763888888888888)]
In [55]:
# eigenvector centrality
eig = nx.eigenvector_centrality(Gph_N1)
sorted(eig.items(), key=lambda item: item[1], reverse=True)[:5]
Out[55]:
[('Body Weight', 0.31359413760975835),
 ('Anoxia', 0.2772561379289162),
 ('Vomiting', 0.27691567684476787),
 ('Edema', 0.26364089072691727),
 ('Nausea', 0.24632470779374863)]
In [56]:
# betweeness centrality
betw = nx.betweenness_centrality(Gph_N1)
sorted(betw.items(), key=lambda item: item[1], reverse=True)[:5]
Out[56]:
[('Body Weight', 0.2051319787752347),
 ('Edema', 0.18748879157449813),
 ('Anoxia', 0.1837441263054493),
 ('Nausea', 0.16286331722488454),
 ('Abdominal Pain', 0.10754199558486524)]
In [ ]:
 

Centralities: Diseases

In [57]:
# degree centrality
deg = nx.degree_centrality(Gph_N2)
sorted(deg.items(), key=lambda item: item[1], reverse=True)[:5]
Out[57]:
[('Postoperative Complications', 0.6787709497206704),
 ('Pregnancy Complications', 0.5837988826815642),
 ('Pain', 0.5670391061452514),
 ('Inflammation', 0.5558659217877095),
 ('Hypertension', 0.5502793296089385)]
In [58]:
# closeness centrality
closeness = nx.closeness_centrality(Gph_N2)
sorted(closeness.items(), key=lambda item: item[1], reverse=True)[:5]
Out[58]:
[('Postoperative Complications', 0.7261663286004056),
 ('Pregnancy Complications', 0.6884615384615385),
 ('Pain', 0.6704119850187266),
 ('Inflammation', 0.6568807339449542),
 ('Hypertension', 0.6532846715328468)]
In [59]:
# eigenvector centrality
eig = nx.eigenvector_centrality(Gph_N2)
sorted(eig.items(), key=lambda item: item[1], reverse=True)[:5]
Out[59]:
[('Postoperative Complications', 0.10949654260189175),
 ('Pregnancy Complications', 0.10469592290209802),
 ('Inflammation', 0.10396913095241815),
 ('Hypertension', 0.10383441050915593),
 ('Seizures', 0.10252108043830677)]
In [60]:
# betweeness centrality
betw = nx.betweenness_centrality(Gph_N2)
sorted(betw.items(), key=lambda item: item[1], reverse=True)[:5]
Out[60]:
[('Postoperative Complications', 0.07511089035937904),
 ('Pain', 0.05987641749548086),
 ('Pregnancy Complications', 0.0515943759797822),
 ('Lung Neoplasms', 0.03751210455488464),
 ('Retinal Diseases', 0.03697498302646217)]
In [ ]:
 
In [ ]:
 

Communities with Louvain: Symptoms

In [61]:
spring_pos = nx.spring_layout(Gph_N1) # might take a little while
node_size_default = 40
In [62]:
# partition = community.best_partition(GC) # idk if this works for v1 
partition = community_louvain.best_partition(Gph_N1)
communities = [partition.get(node) for node in Gph_N1.nodes()]
print('The number of communities is ' + str(max(communities)) + '.')
The number of communities is 6.
In [63]:
# Let's assign each node to its given community
nx.set_node_attributes(Gph_N1, partition, name='community')
In [64]:
colors = [Gph_N1.nodes[n]['community'] for n in Gph_N1.nodes]

fig = plt.figure(figsize = (15, 15))
ax = fig.add_subplot(111)
ax.axis('off')

n = nx.draw_networkx(Gph_N1, 
        spring_pos, 
        ax = ax, 
        node_size = node_size_default,
        with_labels = False,
        node_color = communities)

plt.close();
fig
Out[64]:
In [ ]:
 

Communities with Louvain: Diseases

In [65]:
spring_pos = nx.spring_layout(Gph_N2) # might take a little while
node_size_default = 40
In [66]:
# partition = community.best_partition(GC) # idk if this works for v1 
partition = community_louvain.best_partition(Gph_N2)
communities = [partition.get(node) for node in Gph_N2.nodes()]
num_communities = max(communities)+1

print('The number of communities is ' + str(num_communities))
The number of communities is 5
In [67]:
# Let's assign each node to its given community
nx.set_node_attributes(Gph_N2, partition, name='community')
In [68]:
colors = [Gph_N2.nodes[n]['community'] for n in Gph_N2.nodes]

fig = plt.figure(figsize = (15, 15))
ax = fig.add_subplot(111)
ax.axis('off')

n = nx.draw_networkx(Gph_N2, 
        spring_pos, 
        ax = ax, 
        node_size = node_size_default,
        with_labels = False,
        node_color = communities)

plt.close();
fig
Out[68]:
In [ ]:
 

Analyzing the communities of Diseases

In [69]:
res = {}
for i, v in partition.items():
    res[v] = [i] if v not in res.keys() else res[v] + [i]
print(res)
{1: ['Kidney Calculi', 'Cholera', 'Gram-Negative Bacterial Infections', 'Adrenal Gland Neoplasms', 'Enterocolitis, Pseudomembranous', 'Prostatitis', 'Adenoma, Islet Cell', 'Intestinal Diseases, Parasitic', 'Helicobacter Infections', 'Cholestasis', 'Intestinal Obstruction', 'Gastroenteritis', 'Thrombocytopenia', 'Malabsorption Syndromes', 'Hyperandrogenism', 'Lactose Intolerance', 'Clostridium Infections', 'Encopresis', 'Cystitis, Interstitial', 'Gastrointestinal Hemorrhage', 'Anemia, Hemolytic, Autoimmune', 'AIDS-Related Opportunistic Infections', 'Fecal Incontinence', 'Colitis, Ulcerative', 'Duodenal Ulcer', 'Hypertrichosis', 'Anemia, Hemolytic', 'Ileal Diseases', 'Melena', 'Colorectal Neoplasms', 'Hepatitis A', 'Hematoma', 'Pregnancy Complications, Hematologic', 'Rotavirus Infections', 'Appendicitis', 'Ovarian Neoplasms', 'Intussusception', 'Gastrointestinal Diseases', 'Campylobacter Infections', 'Peritonitis', 'Colitis', 'Hemolytic-Uremic Syndrome', 'Hypokalemia', 'Acne Vulgaris', 'Foreign Bodies', 'Cryptosporidiosis', 'Hepatitis', 'Irritable Bowel Syndrome', 'Enteritis', 'Infarction', 'Bacterial Infections', 'Giardiasis', 'Pancreatic Neoplasms', 'Escherichia coli Infections', 'Adenocarcinoma', 'Infection', 'Rectal Diseases', 'Ureteral Calculi', 'Cross Infection', 'Colonic Diseases, Functional', 'Acquired Immunodeficiency Syndrome', 'Crohn Disease', 'Intestinal Perforation', 'Rectal Neoplasms', 'Multiple Myeloma', 'Gallstones', 'Hepatitis, Viral, Human', 'Colonic Neoplasms', 'Intestinal Diseases', 'Biliary Tract Diseases', 'Dysentery, Bacillary', 'Inflammatory Bowel Diseases', 'HIV Infections', 'Coccidiosis', 'Ureteral Diseases', 'Immunologic Deficiency Syndromes', 'Dehydration', 'Adrenal Hyperplasia, Congenital', 'Celiac Disease', 'Endometriosis', 'Microsporidiosis', 'Colonic Diseases', 'Hirschsprung Disease', 'Autoimmune Diseases', 'Nutrition Disorders', 'Salmonella Infections', 'Gastritis'], 0: ['Corneal Diseases', 'Anisometropia', 'Cataract', 'Retinopathy of Prematurity', 'Spinal Cord Injuries', 'Strabismus', 'Obstetric Labor Complications', 'Ophthalmoplegia', 'Pregnancy Complications', 'Synovitis', 'Diabetes, Gestational', 'Osteoarthritis', 'Placenta Diseases', 'Fetal Death', 'Erythema', 'Fetal Macrosomia', 'Lymphedema', 'Ocular Motility Disorders', 'Ovarian Diseases', 'Jaundice, Neonatal', 'Developmental Disabilities', 'Temporomandibular Joint Disorders', 'Vision Disorders', 'Thrombophlebitis', 'Foot Diseases', 'Vascular Diseases', 'Autonomic Nervous System Diseases', 'Arthritis, Rheumatoid', 'Obstetric Labor, Premature', 'Pancreatitis', 'Hypertension', 'Bone Marrow Diseases', 'Pregnancy Complications, Cardiovascular', 'Refractive Errors', 'Venous Insufficiency', 'Myasthenia Gravis', 'Eyelid Diseases', 'Glaucoma', 'Ataxia', 'Retinal Diseases', 'Dermatitis, Contact', 'Orbital Fractures', 'Pregnancy Complications, Infectious', 'Congenital Abnormalities', 'Ascites', 'Osteoarthritis, Knee', 'Nephrotic Syndrome', 'Orbital Diseases'], 3: ['Lead Poisoning', 'Diabetic Nephropathies', 'Mammary Neoplasms, Experimental', 'Osteoporosis', 'Precancerous Conditions', 'Liver Neoplasms, Experimental', 'Fetal Alcohol Syndrome', 'Glucose Intolerance', 'Starvation', 'Cardiomyopathies', 'Pregnancy in Diabetics', 'Neoplasms, Experimental', 'Fatty Liver', 'Mental Retardation', 'Obesity, Morbid', 'Amenorrhea', 'Liver Cirrhosis, Experimental', 'Acute Kidney Injury', 'Drug-Induced Liver Injury', 'Hyperlipidemias', 'Depressive Disorder', 'Radiation Injuries, Experimental', 'Epilepsy', 'Uremia', 'Osteoporosis, Postmenopausal', 'Alcoholism', 'Prenatal Exposure Delayed Effects', 'Diabetes Mellitus, Type 2', 'Carcinoma, Squamous Cell', 'Kidney Diseases', 'Infant Nutrition Disorders', 'Diabetes Mellitus, Type 1', 'Anuria', 'Protein-Energy Malnutrition', 'Burns', 'Hyperglycemia', 'Protein Deficiency', 'Urinary Bladder Neoplasms', 'Prostatic Neoplasms', 'Metabolic Syndrome X', 'Schizophrenia', 'Vitamin A Deficiency', 'Deficiency Diseases', 'Stomach Neoplasms', 'Morphine Dependence', 'Abnormalities, Drug-Induced', 'Pre-Eclampsia', 'Carcinoma, Hepatocellular', 'Liver Neoplasms', 'Kidney Neoplasms', 'Diabetes Mellitus, Experimental', 'Hyperthyroidism', 'Arthritis, Experimental', 'Substance-Related Disorders', 'Muscular Atrophy', 'Fetal Growth Retardation', 'Arteriosclerosis', 'Sarcoma, Experimental', 'Kidney Failure, Chronic', 'Diabetes Mellitus', 'Hypothyroidism', 'Arthritis', 'Bone Diseases, Metabolic', 'Child Nutrition Disorders', 'Anorexia Nervosa', 'Eating Disorders', 'Malnutrition', 'Substance Withdrawal Syndrome', 'Hyperinsulinism', 'Hypertension, Renal', 'Stomach Ulcer', 'Growth Disorders', 'Polycystic Ovary Syndrome', 'Vitamin B 6 Deficiency', 'Hypercholesterolemia', 'Toxemia', 'Hypertrophy, Left Ventricular', 'Cystic Fibrosis', 'Diabetic Neuropathies'], 4: ['Angina Pectoris', 'Pneumonia', 'Acidosis, Respiratory', 'Esophageal Diseases', 'Ischemia', 'Coronary Disease', 'Brain Diseases', 'Pyloric Stenosis', 'Myocardial Infarction', 'Mitral Valve Insufficiency', 'Heart Defects, Congenital', 'Myocardial Reperfusion Injury', 'Reperfusion Injury', 'Heart Septal Defects, Atrial', 'Coronary Artery Disease', 'Anemia', 'Hyperventilation', 'Pulmonary Embolism', 'Carbon Monoxide Poisoning', 'Gastric Outlet Obstruction', 'Neovascularization, Pathologic', 'Asphyxia Neonatorum', 'Pulmonary Heart Disease', 'Acidosis', 'Esophageal Motility Disorders', 'Infant, Newborn, Diseases', 'Infant, Premature, Diseases', 'Respiratory Distress Syndrome, Newborn', 'Gastroesophageal Reflux', 'Migraine Disorders', 'Emphysema', 'Respiratory Insufficiency', 'Cerebrovascular Disorders', 'Anemia, Sickle Cell', 'Bronchitis', 'Ventricular Dysfunction, Left', 'Lung Diseases, Obstructive', 'Sleep Apnea Syndromes', 'Aneurysm, Dissecting', 'Hemorrhage', 'Polycythemia', 'Asthma', 'Heart Failure', 'Pulmonary Edema', 'Brain Injuries', 'Shock', 'Sudden Infant Death', 'Acute Coronary Syndrome', 'Heart Arrest', 'Myocardial Ischemia', 'Hypotension', 'Angina, Unstable', 'Pulmonary Disease, Chronic Obstructive', 'Arrhythmias, Cardiac', 'Methemoglobinemia', 'Hypertension, Pulmonary', 'Dyspnea', 'Airway Obstruction', 'Respiratory Distress Syndrome, Adult', 'Brain Neoplasms', 'Shock, Hemorrhagic', 'Fetal Diseases', 'Brain Ischemia', 'Cardiomegaly', 'Liver Diseases', 'Liver Cirrhosis', 'Heart Diseases', 'Obesity', 'Altitude Sickness', 'Torticollis', 'Apnea', 'Craniocerebral Trauma', 'Sleep Apnea, Obstructive', 'Parvoviridae Infections'], 2: ['Cerebral Infarction', 'Respiratory Tract Infections', 'Bone Neoplasms', 'Respiratory Syncytial Virus Infections', 'Cerebral Palsy', 'Fibromyalgia', 'Pain, Postoperative', 'Peptic Ulcer', 'Streptococcal Infections', 'Agranulocytosis', 'Postoperative Complications', 'Staphylococcal Infections', 'Neurotic Disorders', 'Hodgkin Disease', 'Stroke', 'Seizures', 'Skin Diseases', 'Inflammation', 'Bronchial Diseases', 'Dementia', 'Sepsis', 'Mycoses', 'Aspergillosis', 'Back Pain', 'Peripheral Nervous System Diseases', 'Lupus Erythematosus, Systemic', 'Common Cold', 'Stress Disorders, Post-Traumatic', 'Sleep Deprivation', 'Influenza, Human', 'Mucocutaneous Lymph Node Syndrome', 'Parkinson Disease', 'Exanthema', 'Psychotic Disorders', 'Tuberculosis, Pulmonary', 'Laryngeal Neoplasms', 'Alzheimer Disease', 'Breast Neoplasms', 'Sleep Disorders', 'Hypersensitivity', 'Abscess', 'Drug Hypersensitivity', 'Bacteremia', 'Leukemia', 'Laryngeal Diseases', 'Lymphatic Diseases', 'Malaria, Falciparum', 'Somatoform Disorders', 'Pain', 'Vocal Cord Paralysis', 'Leukocytosis', 'Anxiety Disorders', 'Multiple Sclerosis', 'Erectile Dysfunction', 'Neutropenia', 'Genital Diseases, Female', 'Endocarditis, Bacterial', 'Leukemia, Myeloid, Acute', 'Lung Neoplasms', 'Malaria', 'Neuroleptic Malignant Syndrome', 'Urinary Tract Infections', 'Fatigue Syndrome, Chronic', 'Lung Diseases, Fungal', 'Cognition Disorders', 'Delirium', 'Headache', 'Lymphoma', 'Parkinsonian Disorders', 'Cough', 'Lung Diseases']}
In [ ]:
 
In [70]:
print(res[0])
['Corneal Diseases', 'Anisometropia', 'Cataract', 'Retinopathy of Prematurity', 'Spinal Cord Injuries', 'Strabismus', 'Obstetric Labor Complications', 'Ophthalmoplegia', 'Pregnancy Complications', 'Synovitis', 'Diabetes, Gestational', 'Osteoarthritis', 'Placenta Diseases', 'Fetal Death', 'Erythema', 'Fetal Macrosomia', 'Lymphedema', 'Ocular Motility Disorders', 'Ovarian Diseases', 'Jaundice, Neonatal', 'Developmental Disabilities', 'Temporomandibular Joint Disorders', 'Vision Disorders', 'Thrombophlebitis', 'Foot Diseases', 'Vascular Diseases', 'Autonomic Nervous System Diseases', 'Arthritis, Rheumatoid', 'Obstetric Labor, Premature', 'Pancreatitis', 'Hypertension', 'Bone Marrow Diseases', 'Pregnancy Complications, Cardiovascular', 'Refractive Errors', 'Venous Insufficiency', 'Myasthenia Gravis', 'Eyelid Diseases', 'Glaucoma', 'Ataxia', 'Retinal Diseases', 'Dermatitis, Contact', 'Orbital Fractures', 'Pregnancy Complications, Infectious', 'Congenital Abnormalities', 'Ascites', 'Osteoarthritis, Knee', 'Nephrotic Syndrome', 'Orbital Diseases']
In [ ]:
 
In [83]:
def process_one_community(community_val):

    # Consider the rows that are in comm with the disseases of the commmmunity
    edges_tmp = edges_[edges_['MeSH Disease Term'].isin(community_val)]
    
    syms_sorted_top = ["none"]
    percent = 0
    name_community = "none"
    
    # filter communities with only more than 3 nodes of symptoms
    if len(edges_tmp['MeSH Symptom Term'].unique())>3:

        # Both parts of the bipartie
        part0_tmp = edges_tmp['MeSH Symptom Term'].unique()
        part1_tmp = edges_tmp['MeSH Disease Term'].unique()

        # Symptom // Disease
        joins_tmp = list(edges_tmp.to_records(index=False))

        # Start Bigraph
        BI_tmp = nx.Graph()
        BI_tmp.add_nodes_from(part0_tmp, bipartite=0)
        BI_tmp.add_nodes_from(part1_tmp, bipartite=1)
        BI_tmp.add_edges_from(joins_tmp)

        # Information
        #print(nx.info(BI_tmp))

        # Taking the largest connected component
        #components = sorted(nx.connected_components(BI_tmp), key=len, reverse=True)
        #largest_component = components[0]
        #BII_tmp = BI_tmp.subgraph(largest_component)

        # Information
        #print(nx.info(BII_tmp))

        # Graph: Symptoms // Diseases
        fig = plt.figure(figsize = (12, 12))
        ax = fig.add_subplot(111)
        ax.axis('off')
        N1, N2 = bipartite.sets(BI_tmp)
        pos = dict()
        pos.update( (n, (1, i)) for i, n in enumerate(N1) ) # put nodes from N1
        pos.update( (n, (2, i)) for i, n in enumerate(N2) ) # put nodes from N2
        nx.draw(BI_tmp, pos=pos, with_labels=True)
        plt.show()

        # extract degrees
        degX, degY = bipartite.degrees(BI_tmp, N1)
        degY = dict(degY)
        syms_sorted = dict(sorted(degY.items(), key=lambda item: item[1], reverse=True))

        # printing the sorted symptoms
        for key, value in syms_sorted.items():
            print(key, ': ', value)

        for diss in list(N2):
            if diss =="Cognition Disorders":
                name_community = "Mental disorders"
            if diss =="Bacterial Infections":
                name_community = "Bacterial infections"
            if diss =="Deficiency Diseases":
                name_community = "Malnutrition deficiency"
            if diss =="Lung Diseases":
                name_community = "Lung or heart problems"
            if diss =="Corneal Diseases":
                name_community = "Vision or Pregnancy problems"
          
        print("\nName community: ",name_community)

        # select only the top symptoms with more or equal to 3 degree
        syms_sorted_top = [k for k,v in syms_sorted.items() if float(v) >= 3]

        # select the edges of the top symptoms 
        edges_tmp2 = edges_tmp[edges_tmp['MeSH Symptom Term'].isin(syms_sorted_top)]

        # number of partial diseases detected with the symptoms in the community
        partial_diseas_ = len(edges_tmp2['MeSH Disease Term'])
        
        # only for verfication
        print(edges_tmp2['MeSH Disease Term'].unique())

        # number of all diseases in the community
        all_diseas_ = len(edges_tmp['MeSH Disease Term'])

        percent = round(partial_diseas_*100/all_diseas_,2)

        print("\nTop symptoms:", syms_sorted_top)

        print("\nTop symptoms detects the", str(percent), "%", "of the diseases in the community")
    
    else:
        print("\nCommunity too small")

    return syms_sorted_top, percent, name_community
In [ ]:
 
In [84]:
name_comm_all = []
tops_syms_all = []
percs_syms_all = []

for idx in range(0,num_communities):
    print("\n\n\n####################### ####################### ####################### #######################")
    print(idx)
    tops_syms, percs_syms, name_comm = process_one_community(res[idx])
    if tops_syms != ["none"]:
        name_comm_all.append(name_comm)
        tops_syms_all.append(tops_syms)
        percs_syms_all.append(percs_syms)


####################### ####################### ####################### #######################
0
Edema :  21
Birth Weight :  14
Diplopia :  6
Amblyopia :  5
Arthralgia :  4
Reflex, Abnormal :  4
Scotoma :  3
Weight Gain :  2
Anoxia :  2
Fetal Distress :  2
Abdominal Pain :  2
Body Weight :  2
Color Vision Defects :  1
Overweight :  1
Nausea :  1
Fatigue :  1
Hemianopsia :  1
Weight Loss :  1
Hyperemesis Gravidarum :  1
Pseudophakia :  1
Vomiting :  1
Abdomen, Acute :  1
Psychophysiologic Disorders :  1

Name community:  Vision or Pregnancy problems
['Pregnancy Complications, Infectious' 'Retinopathy of Prematurity'
 'Pregnancy Complications' 'Diabetes, Gestational' 'Fetal Macrosomia'
 'Fetal Death' 'Obstetric Labor Complications'
 'Obstetric Labor, Premature' 'Placenta Diseases'
 'Pregnancy Complications, Cardiovascular' 'Hypertension'
 'Congenital Abnormalities' 'Jaundice, Neonatal'
 'Developmental Disabilities' 'Foot Diseases' 'Arthritis, Rheumatoid'
 'Synovitis' 'Pancreatitis' 'Spinal Cord Injuries' 'Corneal Diseases'
 'Eyelid Diseases' 'Orbital Diseases' 'Retinal Diseases'
 'Nephrotic Syndrome' 'Ovarian Diseases' 'Vascular Diseases'
 'Thrombophlebitis' 'Venous Insufficiency' 'Bone Marrow Diseases'
 'Lymphedema' 'Dermatitis, Contact' 'Erythema' 'Ascites'
 'Autonomic Nervous System Diseases' 'Ophthalmoplegia' 'Ataxia'
 'Strabismus' 'Vision Disorders' 'Cataract' 'Refractive Errors'
 'Anisometropia' 'Myasthenia Gravis' 'Ocular Motility Disorders'
 'Orbital Fractures' 'Glaucoma' 'Temporomandibular Joint Disorders'
 'Osteoarthritis' 'Osteoarthritis, Knee']

Top symptoms: ['Edema', 'Birth Weight', 'Diplopia', 'Amblyopia', 'Arthralgia', 'Reflex, Abnormal', 'Scotoma']

Top symptoms detects the 73.08 % of the diseases in the community



####################### ####################### ####################### #######################
1
Diarrhea :  45
Abdominal Pain :  18
Body Weight :  10
Constipation :  9
Jaundice :  8
Vomiting :  5
Abdomen, Acute :  5
Colic :  5
Purpura, Thrombocytopenic, Idiopathic :  5
Purpura, Thrombocytopenic :  4
Diarrhea, Infantile :  4
Hirsutism :  4
Dyspepsia :  4
Hypergammaglobulinemia :  3
Nausea :  3
Weight Loss :  3
Virilism :  3
Purpura, Thrombotic Thrombocytopenic :  3
Pelvic Pain :  3
Fever :  3
Psychophysiologic Disorders :  3
Fever of Unknown Origin :  2
Cachexia :  2
Hematemesis :  2
Fatigue :  1
Dysmenorrhea :  1
Birth Weight :  1

Name community:  Bacterial infections
['Bacterial Infections' 'Infection' 'HIV Infections' 'Adenocarcinoma'
 'Colonic Neoplasms' 'Pancreatic Neoplasms' 'Colitis' 'Crohn Disease'
 'Malabsorption Syndromes' 'Celiac Disease' 'Dehydration'
 'Nutrition Disorders' 'Acquired Immunodeficiency Syndrome'
 'Multiple Myeloma' 'Autoimmune Diseases'
 'Immunologic Deficiency Syndromes' 'Gastrointestinal Diseases'
 'Colitis, Ulcerative' 'Colonic Diseases, Functional'
 'Helicobacter Infections' 'Ovarian Neoplasms' 'Gallstones' 'Appendicitis'
 'Gastrointestinal Hemorrhage' 'Colonic Diseases'
 'Irritable Bowel Syndrome' 'Ileal Diseases' 'Intestinal Obstruction'
 'Intussusception' 'Intestinal Perforation' 'Endometriosis' 'Hematoma'
 'Foreign Bodies' 'Peritonitis' 'Infarction' 'Biliary Tract Diseases'
 'Kidney Calculi' 'Ureteral Diseases' 'Ureteral Calculi' 'Prostatitis'
 'Cystitis, Interstitial' 'Hirschsprung Disease' 'Rectal Diseases'
 'Fecal Incontinence' 'Encopresis' 'Gram-Negative Bacterial Infections'
 'Campylobacter Infections' 'Dysentery, Bacillary'
 'Escherichia coli Infections' 'Salmonella Infections' 'Cholera'
 'Clostridium Infections' 'Enterocolitis, Pseudomembranous'
 'Cross Infection' 'AIDS-Related Opportunistic Infections'
 'Microsporidiosis' 'Rotavirus Infections'
 'Intestinal Diseases, Parasitic' 'Cryptosporidiosis' 'Giardiasis'
 'Coccidiosis' 'Adenoma, Islet Cell' 'Colorectal Neoplasms'
 'Rectal Neoplasms' 'Gastroenteritis' 'Enteritis'
 'Inflammatory Bowel Diseases' 'Intestinal Diseases' 'Lactose Intolerance'
 'Hemolytic-Uremic Syndrome' 'Hypokalemia' 'Gastritis' 'Duodenal Ulcer'
 'Hepatitis, Viral, Human' 'Hepatitis A' 'Cholestasis' 'Hepatitis'
 'Anemia, Hemolytic' 'Pregnancy Complications, Hematologic'
 'Anemia, Hemolytic, Autoimmune' 'Thrombocytopenia'
 'Adrenal Gland Neoplasms' 'Adrenal Hyperplasia, Congenital'
 'Hyperandrogenism' 'Acne Vulgaris' 'Hypertrichosis']

Top symptoms: ['Diarrhea', 'Abdominal Pain', 'Body Weight', 'Constipation', 'Jaundice', 'Vomiting', 'Abdomen, Acute', 'Colic', 'Purpura, Thrombocytopenic, Idiopathic', 'Purpura, Thrombocytopenic', 'Diarrhea, Infantile', 'Hirsutism', 'Dyspepsia', 'Hypergammaglobulinemia', 'Nausea', 'Weight Loss', 'Virilism', 'Purpura, Thrombotic Thrombocytopenic', 'Pelvic Pain', 'Fever', 'Psychophysiologic Disorders']

Top symptoms detects the 94.34 % of the diseases in the community



####################### ####################### ####################### #######################
2
Fever :  34
Psychophysiologic Disorders :  12
Fatigue :  9
Body Weight :  7
Nausea :  6
Respiratory Sounds :  6
Hemoptysis :  6
Edema :  5
Supranuclear Palsy, Progressive :  4
Gait Disorders, Neurologic :  4
Anoxia :  4
Vomiting :  4
Hoarseness :  3
Psychomotor Agitation :  3
Confusion :  3
Diarrhea :  2
Catatonia :  2
Sensation Disorders :  2
Agnosia :  2
Weight Gain :  2
Pain, Intractable :  2
Consciousness Disorders :  1
Purpura :  1
Hypokinesia :  1
Purpura, Thrombotic Thrombocytopenic :  1
Spasm :  1
Postoperative Nausea and Vomiting :  1
Constipation :  1
Hot Flashes :  1
Fever of Unknown Origin :  1
Muscle Rigidity :  1
Cardiac Output, Low :  1
Hemianopsia :  1
Dyskinesias :  1
Weight Loss :  1
Skin Manifestations :  1
Dyspepsia :  1

Name community:  Lung or heart problems
['Bacteremia' 'Endocarditis, Bacterial' 'Tuberculosis, Pulmonary'
 'Staphylococcal Infections' 'Streptococcal Infections'
 'Respiratory Tract Infections' 'Sepsis' 'Abscess'
 'Urinary Tract Infections' 'Mycoses' 'Influenza, Human' 'Common Cold'
 'Malaria' 'Malaria, Falciparum' 'Leukemia' 'Leukemia, Myeloid, Acute'
 'Lymphoma' 'Hodgkin Disease' 'Breast Neoplasms' 'Lung Neoplasms'
 'Lung Diseases' 'Seizures' 'Pain' 'Mucocutaneous Lymph Node Syndrome'
 'Leukocytosis' 'Agranulocytosis' 'Neutropenia' 'Lymphatic Diseases'
 'Lupus Erythematosus, Systemic' 'Skin Diseases' 'Exanthema'
 'Drug Hypersensitivity' 'Inflammation' 'Postoperative Complications'
 'Fatigue Syndrome, Chronic' 'Fibromyalgia' 'Multiple Sclerosis'
 'Sleep Disorders' 'Sleep Deprivation' 'Cognition Disorders' 'Dementia'
 'Alzheimer Disease' 'Psychotic Disorders' 'Parkinson Disease'
 'Cerebral Palsy' 'Cerebral Infarction' 'Stroke' 'Delirium'
 'Parkinsonian Disorders' 'Peptic Ulcer' 'Back Pain' 'Headache'
 'Erectile Dysfunction' 'Genital Diseases, Female' 'Anxiety Disorders'
 'Stress Disorders, Post-Traumatic' 'Neurotic Disorders'
 'Somatoform Disorders' 'Laryngeal Neoplasms' 'Laryngeal Diseases'
 'Vocal Cord Paralysis' 'Pain, Postoperative' 'Aspergillosis'
 'Lung Diseases, Fungal' 'Bronchial Diseases'
 'Respiratory Syncytial Virus Infections' 'Cough' 'Hypersensitivity']

Top symptoms: ['Fever', 'Psychophysiologic Disorders', 'Fatigue', 'Body Weight', 'Nausea', 'Respiratory Sounds', 'Hemoptysis', 'Edema', 'Supranuclear Palsy, Progressive', 'Gait Disorders, Neurologic', 'Anoxia', 'Vomiting', 'Hoarseness', 'Psychomotor Agitation', 'Confusion']

Top symptoms detects the 79.71 % of the diseases in the community



####################### ####################### ####################### #######################
3
Body Weight :  78
Weight Loss :  9
Edema :  7
Birth Weight :  7
Weight Gain :  6
Bulimia :  3
Overweight :  2
Hyperphagia :  2
Vomiting :  2
Dyspepsia :  2
Oliguria :  2
Psychophysiologic Disorders :  2
Psychomotor Agitation :  1
Fetal Weight :  1
Catatonia :  1
Fatigue :  1
Jaundice :  1
Abdominal Pain :  1
Fetal Hypoxia :  1
Anorexia :  1
Akathisia, Drug-Induced :  1
Colic :  1
Fever :  1
Hirsutism :  1
Thinness :  1

Name community:  Malnutrition deficiency
['Toxemia' 'Polycystic Ovary Syndrome' 'Sarcoma, Experimental'
 'Carcinoma, Hepatocellular' 'Carcinoma, Squamous Cell'
 'Stomach Neoplasms' 'Liver Neoplasms' 'Liver Neoplasms, Experimental'
 'Mammary Neoplasms, Experimental' 'Prostatic Neoplasms'
 'Kidney Neoplasms' 'Urinary Bladder Neoplasms' 'Neoplasms, Experimental'
 'Precancerous Conditions' 'Bone Diseases, Metabolic' 'Osteoporosis'
 'Osteoporosis, Postmenopausal' 'Arthritis' 'Arthritis, Experimental'
 'Stomach Ulcer' 'Drug-Induced Liver Injury' 'Fatty Liver'
 'Liver Cirrhosis, Experimental' 'Cystic Fibrosis' 'Epilepsy'
 'Mental Retardation' 'Muscular Atrophy' 'Diabetic Neuropathies'
 'Kidney Diseases' 'Diabetic Nephropathies' 'Hypertension, Renal'
 'Acute Kidney Injury' 'Kidney Failure, Chronic' 'Uremia'
 'Fetal Alcohol Syndrome' 'Fetal Growth Retardation' 'Pre-Eclampsia'
 'Pregnancy in Diabetics' 'Prenatal Exposure Delayed Effects'
 'Hypertrophy, Left Ventricular' 'Cardiomyopathies' 'Arteriosclerosis'
 'Abnormalities, Drug-Induced' 'Diabetes Mellitus'
 'Diabetes Mellitus, Experimental' 'Diabetes Mellitus, Type 1'
 'Diabetes Mellitus, Type 2' 'Hyperglycemia' 'Glucose Intolerance'
 'Hyperinsulinism' 'Metabolic Syndrome X' 'Hyperlipidemias'
 'Hypercholesterolemia' 'Child Nutrition Disorders'
 'Infant Nutrition Disorders' 'Malnutrition' 'Deficiency Diseases'
 'Vitamin A Deficiency' 'Vitamin B 6 Deficiency' 'Protein Deficiency'
 'Protein-Energy Malnutrition' 'Starvation' 'Obesity, Morbid'
 'Hyperthyroidism' 'Hypothyroidism' 'Eating Disorders' 'Anorexia Nervosa'
 'Depressive Disorder' 'Schizophrenia' 'Substance-Related Disorders'
 'Growth Disorders' 'Amenorrhea' 'Alcoholism' 'Morphine Dependence'
 'Lead Poisoning' 'Substance Withdrawal Syndrome' 'Burns'
 'Radiation Injuries, Experimental']

Top symptoms: ['Body Weight', 'Weight Loss', 'Edema', 'Birth Weight', 'Weight Gain', 'Bulimia']

Top symptoms detects the 81.48 % of the diseases in the community



####################### ####################### ####################### #######################
4
Anoxia :  55
Body Weight :  15
Chest Pain :  14
Hypercapnia :  7
Vomiting :  6
Birth Weight :  6
Edema :  5
Spasm :  5
Cyanosis :  3
Neurologic Manifestations :  3
Cardiac Output, Low :  3
Psychophysiologic Disorders :  3
Fatigue :  2
Respiratory Sounds :  2
Fever :  2
Unconsciousness :  2
Hydrops Fetalis :  2
Jaundice :  2
Heart Murmurs :  2
Snoring :  2
Hypoventilation :  1
Consciousness Disorders :  1
Nausea :  1
Cachexia :  1
Bulimia :  1
Abdominal Pain :  1
Hypothermia :  1
Thinness :  1
Weight Gain :  1
Overweight :  1
Hyperphagia :  1
Edema, Cardiac :  1
Weight Loss :  1
Hemoptysis :  1
Cheyne-Stokes Respiration :  1
Dyspepsia :  1
Fetal Hypoxia :  1
Heartburn :  1
Persistent Vegetative State :  1

Name community:  none
['Liver Diseases' 'Liver Cirrhosis' 'Asthma' 'Hypertension, Pulmonary'
 'Sleep Apnea Syndromes' 'Heart Defects, Congenital' 'Heart Diseases'
 'Cardiomegaly' 'Heart Failure' 'Myocardial Ischemia' 'Coronary Disease'
 'Coronary Artery Disease' 'Myocardial Infarction'
 'Infant, Newborn, Diseases' 'Obesity'
 'Respiratory Distress Syndrome, Newborn' 'Fetal Diseases'
 'Asphyxia Neonatorum' 'Infant, Premature, Diseases'
 'Ventricular Dysfunction, Left' 'Methemoglobinemia' 'Hemorrhage'
 'Ischemia' 'Brain Neoplasms' 'Brain Diseases' 'Cerebrovascular Disorders'
 'Esophageal Diseases' 'Torticollis' 'Angina Pectoris'
 'Esophageal Motility Disorders' 'Gastroesophageal Reflux'
 'Pulmonary Embolism' 'Dyspnea' 'Acute Coronary Syndrome'
 'Angina, Unstable' 'Aneurysm, Dissecting' 'Gastric Outlet Obstruction'
 'Pyloric Stenosis' 'Migraine Disorders' 'Bronchitis'
 'Lung Diseases, Obstructive' 'Pulmonary Disease, Chronic Obstructive'
 'Pneumonia' 'Pulmonary Edema' 'Respiratory Distress Syndrome, Adult'
 'Altitude Sickness' 'Apnea' 'Sleep Apnea, Obstructive' 'Hyperventilation'
 'Respiratory Insufficiency' 'Airway Obstruction' 'Brain Injuries'
 'Brain Ischemia' 'Heart Septal Defects, Atrial' 'Arrhythmias, Cardiac'
 'Myocardial Reperfusion Injury' 'Heart Arrest' 'Pulmonary Heart Disease'
 'Hypotension' 'Reperfusion Injury' 'Anemia' 'Anemia, Sickle Cell'
 'Polycythemia' 'Acidosis' 'Sudden Infant Death' 'Emphysema'
 'Shock, Hemorrhagic' 'Neovascularization, Pathologic' 'Shock'
 'Carbon Monoxide Poisoning' 'Acidosis, Respiratory']

Top symptoms: ['Anoxia', 'Body Weight', 'Chest Pain', 'Hypercapnia', 'Vomiting', 'Birth Weight', 'Edema', 'Spasm', 'Cyanosis', 'Neurologic Manifestations', 'Cardiac Output, Low', 'Psychophysiologic Disorders']

Top symptoms detects the 78.12 % of the diseases in the community
In [73]:
# Show all the communities names
print(name_comm_all)
['Vision or Pregnancy problems', 'Bacterial infections', 'Lung or heart problems', 'Malnutrition deficiency', 'none']
In [74]:
# List of all the symptoms
print(tops_syms_all)
[['Edema', 'Birth Weight', 'Diplopia', 'Amblyopia', 'Arthralgia', 'Reflex, Abnormal', 'Scotoma'], ['Diarrhea', 'Abdominal Pain', 'Body Weight', 'Constipation', 'Jaundice', 'Vomiting', 'Abdomen, Acute', 'Colic', 'Purpura, Thrombocytopenic, Idiopathic', 'Purpura, Thrombocytopenic', 'Diarrhea, Infantile', 'Hirsutism', 'Dyspepsia', 'Hypergammaglobulinemia', 'Nausea', 'Weight Loss', 'Virilism', 'Purpura, Thrombotic Thrombocytopenic', 'Pelvic Pain', 'Fever', 'Psychophysiologic Disorders'], ['Fever', 'Psychophysiologic Disorders', 'Fatigue', 'Body Weight', 'Nausea', 'Respiratory Sounds', 'Hemoptysis', 'Edema', 'Supranuclear Palsy, Progressive', 'Gait Disorders, Neurologic', 'Anoxia', 'Vomiting', 'Hoarseness', 'Psychomotor Agitation', 'Confusion'], ['Body Weight', 'Weight Loss', 'Edema', 'Birth Weight', 'Weight Gain', 'Bulimia'], ['Anoxia', 'Body Weight', 'Chest Pain', 'Hypercapnia', 'Vomiting', 'Birth Weight', 'Edema', 'Spasm', 'Cyanosis', 'Neurologic Manifestations', 'Cardiac Output, Low', 'Psychophysiologic Disorders']]
In [75]:
# Some metrics
avg = stats.mean(percs_syms_all)
med = stats.median(percs_syms_all)
std = stats.stdev(percs_syms_all)
print("Percentages:", percs_syms_all)
print("Avg:", round(avg,2))
print("Std:", round(std,2))
print("Med:", round(med,2))
Percentages: [73.08, 94.34, 79.71, 81.48, 78.12]
Avg: 81.35
Std: 7.91
Med: 79.71
In [ ]:
 

Predict the disease category based on symptoms

In [76]:
df = pd.DataFrame({'Symptoms': tops_syms_all, 'Disease Category': name_comm_all, 'Percentage': percs_syms_all})
In [77]:
df
Out[77]:
Symptoms Disease Category Percentage
0 [Edema, Birth Weight, Diplopia, Amblyopia, Art... Vision or Pregnancy problems 73.08
1 [Diarrhea, Abdominal Pain, Body Weight, Consti... Bacterial infections 94.34
2 [Fever, Psychophysiologic Disorders, Fatigue, ... Lung or heart problems 79.71
3 [Body Weight, Weight Loss, Edema, Birth Weight... Malnutrition deficiency 81.48
4 [Anoxia, Body Weight, Chest Pain, Hypercapnia,... none 78.12
In [78]:
def pred_categ_disease(symtoms):
    for symt in symtoms:
        for idx in range(0,len(df)):
            if symt in df["Symptoms"][idx]:
                diss_categ = df["Disease Category"][idx]
                percent_categ = df["Percentage"][idx]
                print("Predicted disease: "+diss_categ)
                print("Top symptoms detects the "+str(percent_categ)+" % of the diseases in the community")
In [79]:
pred_categ_disease(["Weight Loss"])
Predicted disease: Bacterial infections
Top symptoms detects the 94.34 % of the diseases in the community
Predicted disease: Malnutrition deficiency
Top symptoms detects the 81.48 % of the diseases in the community
In [80]:
pred_categ_disease(["Nausea"])
Predicted disease: Bacterial infections
Top symptoms detects the 94.34 % of the diseases in the community
Predicted disease: Lung or heart problems
Top symptoms detects the 79.71 % of the diseases in the community
In [81]:
pred_categ_disease(["Dyspepsia"])
Predicted disease: Bacterial infections
Top symptoms detects the 94.34 % of the diseases in the community
In [82]:
pred_categ_disease(["Chest Pain"])
Predicted disease: none
Top symptoms detects the 78.12 % of the diseases in the community
In [ ]:
 
In [ ]: